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Transfer entropy is a recently introduced information-theoretic measure quantifying directed statistical 
coherence between spatiotemporal processes, and is widely used in diverse fields ranging from finance to 
neuroscience. However, its relationships to fundamental limits of computation, such as Landauer's limit, 
remain unknown. Here we show that in order to increase transfer entropy (predictability) by one bit, heat 
flow must match or exceed Landauer's limit. Importantly, we generalise Landauer's limit to bi-directional 
information dynamics for non-equilibrium processes, reveahng that the hmit applies to prediction, in 
addition to retrodiction (information erasure). Furthermore, the results are related to negentropy, and to 
Bremermann's limit and the Bekenstein bound, producing, perhaps surprisingly, lower bounds on the 
computational deceleration and information loss incurred during an increase in predictability about the 
process. The identified relationships set new computational limits in terms of fundamental physical 
quantities, and estabUsh transfer entropy as a central measure connecting information theory, 
thermodynamics and theory of computation. 

Transfer entropy' was designed to determine the direction of information transfer between two, possibly 
coupled, processes, by detecting asymmetry in their interactions. It is a Shannon information-theoretic 
quantity- * which measures a directed relationship between two time-series processes Y and X. Specifically, 
the transfer entropy Ty^x measures the average amount of information that states y„ at time n of the source time- 
series process 7 provide about the next states x„+i of the destination time-series process X, in the context of the 
previous state Xn of the destination process (see more details in Methods): 



Ty^x = ( log 



p(Xn+i|Xn,y„, 

p(x„+i|x„) 



(1) 



The definition is asymmetric in YandX, hence the labelling of an information source and destination. Intuitively, 
it helps to answer the question "if I know the state of the source, how much does that help to predict the state 
transition of the destination?". 

Following the seminal work of Schreiber' numerous applications of transfer entropy have been successfully 
developed, by capturing information transfer within various domains, such as finance^, ecology*, neuroscience' ", 
biochemistry'', distributed computation'" statistical inference'"*, complex systems''*, complex networks'^'"", 
robotics", etc. Interestingly, maxima of transfer entropy were observed to be related to critical behaviour, e.g., 
average transfer entropy was observed to maximize on the chaotic side of the critical regime within random 
Boolean networks'", and was analytically shown to peak on the disordered side of the phase transition in a 
ferromagnetic 2D lattice Ising model with Glauber dynamics''^. Transfer entropy was also found to be high while 
a system of coupled oscillators was beginning to synchronize, followed by a decline from the global maximum as 
the system was approaching a synchronized state"". Similarly, transfer entropy was observed to be maximized in 
coherent propagating spatiotemporal structures within cellular automata (i.e., gliders)'", and self-organizing 
swarms (cascading waves of motions)'*. 

There is growing awareness that information is a physical quantity, with several studies relating various 
information-theoretic concepts to thermodynamics^""^'', primarily through Landauer's principle^^. In this paper 
we report on a physical interpretation of transfer entropy, and its connections to fundamental limits of com- 
putation, such as Landauer's limit. 

Landauer's principle, dating back to 196P^, is a physical principle specifying the lower theoretical limit of 
energy consumption need for a computation. It associates the logical irreversibility of functions involved in the 
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computation with physical irreversibility, requiring a minimal heat 
generation per machine cycle for each irreversible function. 
According to Bennett^", "any logically irreversible manipulation of 
information, such as the erasure of a bit . . ., must be accompanied by 
a corresponding entropy increase in non-information bearing 
degrees of freedom of the information processing apparatus or its 
environment". Specifically, the principle states that irreversible 
destruction of one bit of information results in dissipation of at least 
fcriog 2 J of energy into the environment (i.e. an entropy increases in 
the environment by this amount — the Landauer limit). Here Tis the 
temperature of the computing circuit in kelvins and k is Boltzmann's 
constant. 

We shall consider the non-equilibrium thermodynamics of a 
physical system X close to equilibrium. At any given moment in time, 
«, the thermodynamic state Xn of the physical system is given by a 
vector x„ e _R'', comprising d variables, for instance the (local) pres- 
sure, temperature, chemical concentrations and so on. A state vector 
completely describes the physical macrostate as far as predictions of 
the outcomes of all possible measurements performed on the system 
are concerned^"". The state space of the system is the set of all possible 
states of the system. 

The thermodynamic state is generally considered as a fluctuating 
entity so that conditional probability for a transition from x,, to x^^. j, 
that is,^ (Xn+ 1 |xn), is a clearly defined property of the system, and can 
be accurately estimated by a proper sampling procedure. Each 
macrostate can be realised by a number of different microstates 
consistent with the given thermodynamic variables. Importantly, 
in the theory of non-equilibrium thermodynamics close to equilib- 
rium, the microstates belonging to one macrostate x are equally 
probable. 

A state vector, y, describes a state of some exterior system Y", 
possibly coupled to the system represented by X. Due to the presence 
or lack of such coupling, the time-series processes corresponding to 
X and Y may or may not be dependent. For a state transition from x^ 
to x„+ 1, we shall say that (j{x)y is the internal entropy production of X 
in the context of some source Y, while l!^S{x)^^t is the entropy pro- 
duction attributed to Y, so that (see Methods, (22)): 

^s{x)=o{x)y+^s{x)^^„ (2) 

where AS(x) is the total variation of the entropy of system X. 

Henceforth, we shall consider two simple examples illustrating 
entropy dynamics. In both these examples the physical system X is 
surrounded by the physical system Y (note that X is not a component 
of Y). The first example is the classical Joule expansion, and the 
second is compression in the Szilard engine-like device. These exam- 
ples are fully described in Methods, illustrating cases with and with- 
out external entropy production. A comparison between these two 
examples shows that, although the resultant entropy change is the 
same in magnitude |AS(x)| = k log 2 in both cases, the change is 
brought about differently: (i) for the Joule expansion (of a one mole- 
cule gas), a{x)y = k log 2 and AS{x)g^f = 0, while (ii) for the Szilard 
engine's compression (resetting one bit) a{x)y = 0 and AS{x)^xt = ^k 
log 2. 

At this stage, we would like to point out two important lessons 
from this comparison. Firstly, as argued by Bennett^", "a logically 
irreversible operation, such as the erasure of a bit or the merging of 
two paths, may be thermodynamically reversible or not depending 
on the data to which it is applied". The (computation) paths referred 
to here are trajectories through the state-space or phase portrait of a 
system: if at time step n + 1 we reach a global system state x„+i with 
multiple precursor states x„, then we have irreversibly destroyed 
information which is non-zero. This information is quantified by 
the entropy of the previous state conditioned on the next state, i.e., 
the conditional entropy /j(x„ | x^+i) at time step n + 1 (see Methods). 
Bennett elaborates that, if the data being erased is random, its erasure 
would represent a reversible entropy transfer to the environment. 



compensating an earlier entropy transfer from the environment dur- 
ing, e.g., a previous isothermal expansion. This, as expected, would 
make the total work yield of the cycle zero, in obedience to the Second 
Law. For a bit reset, as for any usual deterministic digital computa- 
tion, the data is not random, being determined by the device's initial 
state — this is a crucial difference, as pointed out by Landauer and 
Bennett. 

It is worth pointing out that the laws "of a closed physical system 
are one-to-one"^^' meaning that in closed physical systems (or the 
universe as a whole) computational paths do not merge, and 
information cannot be destroyed. We can only measure information 
destruction in open systems, and what we measure is a departure of 
information from this system into the external, unobserved envir- 
onment, where the destroyed information is offloaded along with 
energy dissipation. Note that a typical connection between an 
observed computational system and external environment is the 
physical representation of the computational system (e.g. bit regis- 
ters): after all, "information is physical"^^. 

Secondly, we note that for the Szilard engine's compression (reset- 
ting the bit), there is a decrease in the thermodynamical entropy of 
the one molecule gas by k log 2, and so one may argue that there is an 
increase in predictability about this system transition. It is precisely 
this intuition that was recently formalized via transfer entropy cap- 
turing the external entropy production^". 

Arguably, transfer entropy takes an opposite perspective to 
information destruction, focussing on ability for prediction in 
coupled systems, rather than uncertainty in retrodiction. In our 
recent work^**, transfer entropy has been precisely interpreted ther- 
modynamically. The proposed thermodynamic interpretation of 
transfer entropy near equilibrium used the specialised Boltzmann's 
principle, and related conditional probabilities to the probabilities of 
the corresponding state transitions. This in turn characterised trans- 
fer entropy as a difference of two entropy rates: the rate for a resultant 
transition and another rate for a possibly irreversible transition 
within the system affected by an additional source. Then it was 
shown that this difference, the local transfer entropy, is proportional 
to the external entropy production, possibly due to irreversibility. In 
the following sections we revisit the main elements of this approach, 
leading to new fundamental connections with Landauer's principle, 
Bremermann's limit, the Bekenstein bound, as well as Massieu- 
Planck thermodynamic potential (free entropy). 

Results 

Preliminary results: transfer entropy as external entropy 
production. Supported by the background described in Methods, 
specifically the assumptions (23) - (24) and the expressions (18) - 
(20), transfer entropy can be interpreted via transitions between 
states^": 



fy-x(« + l)=log2ir^- 

^2 



fcl0g2V 



(3) 



when one considers a small fluctuation near an equilibrium, Zi ~ 
Z2, as the number of microstates in the macrostates does not change 
much. This removes the additive constant. Using the expression for 
entropy production (2), one obtains 

fclog2 ■ 



ty^x(«+l) 



(4) 



If Zi Zj, the relationship includes some additive constant logj 



That is, the transfer entropy is proportional to the external entropy 
production, brought about by the source of irreversibility Y. The 
opposite sign reflects the different direction of entropy production 
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attributed to the source Y: when AS{x)gxt > 0, i.e. the entropy due to 
interactions with the surroundings increased during the transition in 
X, then the local transfer entropy is negative, and the source misin- 
forms about the macroscopic state transition. When, on the other 
hand, AS{x)i,^f < 0, i.e. some entropy produced during the transition 
in X dissipated to the exterior, then the local transfer entropy is 
positive, and better predictions can be made about the macroscopic 
state transitions in X if source Y is measured. 

Turning to our examples (see Methods), we note that (i) for the 
Joule expansion AS{x)ext = 0, and so, according to equation (4), ty^x 
= 0 as well, as the transition is adiabatic and X and Yare independent 
processes, while (ii) for the SzUard engine's compression ASix)^^! — 
— k log 2, and so equation (4) yields ty^x = 1- That is, in the latter 
case of resetting one bit, the decrease in the thermodynamical 
entropy of the one molecule gas by k log 2 is accompanied by an 
increase in predictability about this state transition, by one bit pre- 
cisely: this increase is captured by the transfer entropy from the 
exterior heat bath Y to the container X. 

It is important to realize that local transfer entropy may increase, 
indicating an increase of predictability about a transition, not neces- 
sarily only when there is an irreversible operation, such as a bit reset. 
In other words, predictability about a transition may be increased for 
a wider range of processes. The transfer entropy quantifies then the 
extent by how much the predictability is increased. Furthermore, in 
general, there is a distinction between retrodiction (e.g., bit reset) 
when multiple computational paths converge, and a more compli- 
cated pred/crion along potentially diverging forward-looking compu- 
tational paths. This distinction is addressed in the following section. 

Connection to Landauer's limit. Turning our attention to 
quasistatic and not necessarily reversible processes, we note that in 
these processes Zi ~ Z^, and under this approximation, equation (4) 

Z\ 

still holds without the additive constant logj — . Furthermore, under 

■Z2 

constant temperature, the external entropy production is 
l^^ext= '^<iext I T = ^^_ext / T , where q^^f represents the heat flow to 
the system from the exterior in the context of the source Y. Hence, 

1 Alje;cf 



fy^x(n + l) = - 



fc log 2 T 



(5) 



In other words, for irreversible but quasistatic processes, local 
transfer entropy is proportional to the heat received or dissipated 
by the system from/to the exterior. 

Thus, we observe that Landauer's limit kT log 2 that associated a 
minimum entropy with a single bit of information is applicable here 
as well. In particular, for quasistatic processes, using (5), we obtain an 
equality that includes the classical Landauer's limit: 



Ai7«t = - (fcr log l)tY^x{n + \) 



(6) 



Both of the considered examples, the Joule expansion of a one mole- 
cule gas and the Szilard engine's compression (resetting one bit), can 
be interpreted using this equation. For the Joule expansion, Aq^^t — 0 
due to thermal isolation, and there is a zero transfer entropy ty^x ~ 0 
due to independence of X and F, so either side of the equation is 
trivially zero. During the bit reset by the SzOard engine compression, 
heat is dissipated, yielding Aq^^t = —kT log 2, while the transfer 
entropy ty^x — !> again in agreement with equation (6). 

Landauer inequalities for non-equilibrium dynamics. Depending 
on the processes, heat transfer can occur at different temperatures. 



and, in general. 



'^lext / T ¥= Aq^^t / T . Nevertheless, under some 



stronger assumptions outlined in^", the conditional entropies can 
be related to the heat transferred in the transition, per 
temperature, even when temperatures are varying. In a general 



non-equilibrium case, we may consider two cases: (i) when the 
system dissipates heat, transfer entropy is positive, and Z^ £ Z2: 

1 f dqexi 

fciog2j r ' 



ty^x(n+l)<- 



(7) 



and (ii) when the system absorbs heat, transfer entropy is negative 
and Zi s Zj. 



tY^x{n + \)>- 



1 



log 2 , 



(8) 



In the first case (a cooling system with positive transfer entropy), the 



negative 



dqext/T is bounded from above by negative —k log 2 



ty 



!f(w + 1), while in the second case (a heating system with 



negative transfer entropy), the positive 

below by positive — log 2 ty^x{n + 
values. 



d^ext/T is bounded from 
1). Generally, for absolute 



>fclog2 |ty^x(« + l) 



For isothermal processes, this reduces to 

\Aq,^t\>kT\og2 \tY^x{n + l) 



(9) 



(10) 



For non-isothermal processes, a linear relationship between the 
transferred heat and temperature breaks down. For example, 
transfer entropy of a cycling system interacting with one hot (T/,) 
and one cold (T^) thermal reservoirs, and exchanging with the 
surroundings the net heat {qext)^ can be bounded either as 



0<ty_x(«+l)< 



1 1 



k log 2 \Tc Th 



(qext) 



log 2 



(11) 



when Zi £ Z2, or 



0< -(y_x(n + l)< 



k log 2 \T, Th 



(12) 



when Zi s Zj. 

Hence, we obtain an inequality involving a modified Landauer's 
limit: 



\{qe: 



log 2 , 

l>^l^.^x(n + l) 



(13) 



The expressions (6) and (13) essentially set the "conversion rate" 
between transfer entropy and the dissipated/received heat. 
Specifically, when transfer entropy is increased within the system 
by one bit, (y_^x(w + 1) = 1, the dissipated heat must be equal to 
(or larger than the modified) Landauer's limit. The obtained inequal- 
ity is in agreement with the generalised Clausius statement consid- 
ered by^'-" in the context of information reservoirs and memory 
writing. The modified version of Landauer's principle offered in^' 
appears to have a slight error in the concluding equation [50] where 
the corresponding {PcoU ^ Phot) term is in the numerator rather than 
the denominator. 

Intuitively, when the system is cooled by losing heat equivalent to 
Landauer's limit, the predictability about the system cannot be 
increased by more than one bit. This interpretation is non-trivial 
because Landauer's limit specifies the amount of heat needed to reset 
one bit of information (limiting retrodiction), i.e., information is 
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destroyed because multiple computational paths converge, while 
when dealing with local transfer entropy one considers prediction 
along forward-looking computational paths which may diverge. 
Thus, the suggested interpretation generalises Landauer's limit to 
bi-directional information dynamics for quasistatic processes, and 
generalises the modified Landauer's limit to bi-directional informa- 
tion dynamics for non-equilibrium processes (subject to the afore- 
mentioned additional assumptions specified in""*). 

Connection to Bremermann's limit. In this section we analyze how 
fast a physical computing device can perform a logical operation, by 
connecting this problem to the dynamics of predictability captured 
by local transfer entropy. Margolus and Levitin'" specified this 
question through computational speed: the maximum number of 
distinct states that the system can pass through, per unit of time, 
pointing out that, for a classical computer, this would correspond to 
the maximum number of operations per second. It is well-known 
that this quantity is limited {Bremermann's limif), and is 
immediately connected to how much energy is available for 
information processing, e.g., for switching between distinct states. 
As pointed out by Margolus and Levitin^", the rate at which a system 

can oscillate between two distinct states is v±< — , where h is the 

h 

Planck constant, and E is fixed average energy, assuming zero of 
energy at the ground state. For a quantum system, where distinct 
states are orthogonal, "the average energy of a macroscopic system is 
equal to the maximum number of orthogonal states that the system 
can pass through per unit of time"™. The limit is smaller when a 

2E 

sequence of oscillations is considered: < — for a long evolution 

h 

through orthogonal states. 

The work by Margolus and Levitin'" strengthened a series of pre- 
vious results which related the rate of information processing to the 

4(5£ 

standard deviation of the energy: v± < — — (cf ''^). While these pre- 

h 

vious results specified that a quantum state with spread in energy SB 
1 h 

takes time at least At = - = to evolve to an orthogonal state, the 
V 4dE 

result of Margolus and Levitin bounds the minimum time via the 
h 

average energy £: At > — , rather than via the spread in energy SE 
AE 

which can be arbitrarily large for fixed E. 

Importantly, Margolus and Levitin^" argued that these bounds are 
achievable for an ordinary macroscopic system (exemplified, for 
instance, with a lattice gas): "adding energy increases the maximum 
rate at which such a system can pass through a sequence of mutually 
orthogonal states by a proportionate amount"^". 

Following Lloyd'''' we interpret Bremermann's limit as the max- 
imum number of logical operations that can be performed per sec- 
1 2E 

ond, i.e. v — — — — , where h is the reduced Planck constant, and E 

At nn 
is the energy of the system. 

We now consider a transition during which the system dissipates 
energy, A£ < 0, noting that the notation A£ for the energy change 
should not be confused with the spread in energy 3E. Then we define 
the change in the maximum number of logical operations per second, 

2A£ 

i.e. the computational deceleration, as Av = — — . This quantity 

nn 

accounts for how much the frequency of computation v is reduced 
when the system loses some energy. This expression can also be 
written as h Av = 4A£, relating the energy change to the change in 
frequency. 

Then, generalizing (10) to | A£| a fcTlog 2 | ty_^x| > where A£ is the 
heat energy which left the system X and entered the exterior F during 
an isothermal process, we can specify a lower bound on the compu- 
tational deceleration needed to increase predictability about the 
transition by fy^^x. at a given temperature: 



Av>ii^fcT|fy^x| 
Tin 



(14) 



That is, the expression (14) sets the minimum computational decel- 
eration needed to increase predictability. For example, considering 
the Szilard engine's compression resetting one bit, we note that the 
computational deceleration within such a device, needed to increase 

2 log 2 

predictability about the transition by one bit, is at least — fcT, or 

nn 

4 

— kT log 2. In other words, the product h Av is bounded from below 
h 

by the energy equal to four times the Landauer's limit, that is, h Av s 
4fcriog 2. 

Connection to the Bekenstein bound. Another important limit is 
the Bekenstein bound: an upper limit 1 on the information that can be 
contained within a given finite region of space which has a finite 
amount of energy fi""*: 

InRE , , 

where R is the radius of a sphere that can enclose the given system, 
and c is the speed of light. 

While Bremermann's limit constrains the rate of computation, the 
Bekenstein bound restricts an information-storing capacity''^. It 
applies to physical systems of finite size with limited total energy 
and limited entropy, specifying the maximal extent of memory with 
which a computing device of finite size can operate. 

Again considering a transition during which the system dissipates 
energy, A£ < 0, we define the change in the maximum information I 
required to describe the system, i.e., the information loss, as 
2nRAE 
he log 2 

Then the predictability during an isothermal state transition 
within this region, reflected in the transfer entropy ty^x from a 
source Y, is limited by the information loss A7 associated with the 
energy change A£ during the transition. Specifically, using | A£| s kT 
log 2 I ty^xl for any source Y, we obtain: 

Ai>^-PkT\ty^x\. (16) 
he 

While predictability about the system is increased by one bit, at a 
given temperature, i.e., ty^xif^ + 1) = 1, there is a loss of at least 
2.nR 

kT from the maximum information contained within (or 

he 

describing) the system. 

Let us revisit the Szilard engine compression during a bit reset 
operation, this time within a spherical container of fixed radius R 
which dissipates heat energy to the exterior. Such a compressed 
system needs less information to describe it than an uncompressed 
system, and the corresponding information loss about the com- 
pressed system Al is bounded from below by Landauer's limit scaled 

, 2nR 

with — . 

he log 2 

It is important to realise that while the system dissipates energy 
and loses information/entropy, the increased predictability is about 
the transition. Therefore, this increased predictability reflects the 
change of information rather than the amount of information in 
the final configuration. Hence, the expressions (14) and (16) set 
transient limits of computation, for computational deceleration 
and information loss, respectively. 

Discussion 

The obtained relationships (6), (9), (13), (14) and (16) specify dif- 
ferent transient computational limits bounding increases of predict- 
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ability about a system during a transition, represented via transfer 
entropy. These relations explicitly identify constraints on transfer 
entropy in small scale physical systems, operating on the scale of 
the thermal energy kT, and being essentially 'quantized' by 
Landauer's limit. These constraints express increases of predictabil- 
ity via dissipated heat/ energy; set the minimum computational decel- 
eration needed to increase predictability; and offset the loss in the 
maximum information contained within a physical system by the 
predictability gained during a transition. Unlike classical Bremer- 
mann's limit and the Bekenstein bound which set the maximum 
computational speed and information, these inequalities specify 
the lower bounds, showing that in order to achieve a gain in predict- 
ability, the transient computational dynamics such as deceleration 
(or information loss) need to operate faster (or contain more) than 
these limits. Understanding these limits and their implications is 
becoming critical as computing circuitry is rapidly approaching these 
regimes"". 

Finally, we point out an important relationship between an 
increase of predictability about the system (higher local transfer 
entropy) and negentropy: the entropy that the system exports (dis- 
sipates) to keep its own entropy low*'. That is, the expression (4) may 
indicate the general applicability to guided self-organisation in vari- 
ous artificial life scenarios, where one would expect that maximising 
transfer entropy corresponds to maximising negentropy. It is known 
that negentropy AS^xf = ^'C, where <1> is the Massieu- Planck ther- 
modynamic potential (free entropy). It was shown that maximising 

is related to stability in several molecular biology contexts (e.g., 
protein stability^"), and so the suggested thermodynamic interpreta- 
tion associates such increases in stability with increases in transfer 
entropy to the system. One may also argue that the increase of 
stability in biological systems due to a free entropy change (measured 
in bits) is also scaled, and in some cases 'quantised', by Landauer's 
limit. 

Methods 

Preliminaries. Formally, consider a time-series process X of the (potentially 
multivariate) random variables {... X,„ ...} with process realizations 

{... x„-i,x„,x„+i ...} for countable time indices n. The underlying 5fafe of the process 
Xis described by a time series of vectors {... Xn_i, X^, Xn+i ...} with realizations 
{... Xn-i, Xn, Xn+i ...}, where the multivariate realization Xn fully describes the state of 
the process at n, perhaps using vectors Xj, ~ {Xn-i^+i, x^j-i, x„} for a length k 
Markovian process, or for a thermodynamic process by including aU relevant 
thermodynamic variables. If vectors {x^-f^+i, x„_i, x„} are interpreted as 
embedding vectors^^, as proxies of hidden states, one should in general be cautious to 
avoid false coupling detections'". 

The probability distribution function for observing a realization Xn is denoted by 

p{Xn), while p(x„+ 1 |x„) — , denotes conditional probability of observing 

realization Xn+ 1 having observed x„ at the previous time step, where p(x„+ 1, Xn) is the 
joint probability of the two realizations. These quantities are similarly defined for 
process Y, for corresponding time indices n. 

Transfer entropy. The transfer entropy Ty-^x, defined by (1), is a conditional 
mutual information'' between Y,, and X^+i given X^. Following Fano'^^ we can 
quantify "the amount of information provided by the occurrence of the event 
represented by" y^ "about the occurrence of the event represented by" x^+i, 
conditioned on the occurrence of the event represented by x^. That is, we can 
quantify local or point-wise transfer entropy^° in the same way Fano derived local 
or point-wise conditional mutual information*^ This is a method applicable in 
general for different information-theoretic measures'^^; for example, local entropy 
or Shannon information content for an outcome x„ of process X is defined as 
hi^n) — ~log2 p{x„). The quantity h{x„) is simply the information content 
attributed to the specific symbol x^j, or the information required to predict or 
uniquely specify that specific value. Other local information-theoretic quantities 
may be computed as sums and differences of local entropies, e.g., | x„) — 

h{Xj^+i, x„) — h{x„), where h{x„+i, x„) is the local joint entropy. In computing 
these quantities, the key step is to estimate the relevant probability distribution 
functions. This could be done using multiple realisations of the process, or 
accumulating observations in time while assuming stationarity of the process in 
time (and/or space for spatiotemporal processes). 

As such, the local transfer entropy may be expressed as a difference between local 
conditional entropies; 



(Y^x(n + l)=loj 



p(Xn+i|x„,y„) 

^ p(Xn + l|x„) 



= ?!(x„+i|x„)-?!(x„+i|x„,y„), 
where local conditional entropies are defmed as follows''; 

'l(x„+l|x„)= -log2p(x„ + ,|x„), 

ft(x„+i|x„,y„) = -log2p(x„+i|x„,y„). 



(17) 

(18) 
(19) 
(20) 



Entropy definitions 

The thermodynamic entropy was originally defined by Clausius as a state function 
5 satisfying 



Ja 



Sb — Sa= dij„v/r. 



(21) 



where g^^v is the heat transferred to an equilibrium thermodynamic system during a 
reversible process from state A to state B. It can be interpreted, from the perspective of 
statistical mechanics, via the famous Boltzmann's equation S — k log W, where Wis 
the number of microstates corresponding to a given macrostate. Sometimes W is 
termed "thermodynamic probability" which is quite distinct from a mathematical 
probability bounded between zero and one. In general, W can be normalized to a 
probability p — W/N, where N is the number of possible microstates for all macro- 
states. This is not immediately needed as we shall consider (and later normalize) 
relative "thermodynamic probabilities". 

At this stage, we recall a specialization of Boltzmann's principle by Einstein^'', for 
two states with entropies S and Sq and "relative probability" (the ratio of numbers 
W and Wq that account for the numbers of microstates in the macrostates with S and 
5o respectively), given hy.S — Sq — k log W^. Here again the "relative probability" 
is not bounded between zero and one. For instance, if the number of microstates in B 
(i.e., after the transition) is twice as many as those in A (as is the case, for instance, 
after a free adiabatic gas expansion; see the Joule expansion example described in the 
next section), then W,. — 2, and the resultant entropy change is k log 2. Thus, the 
"relative probability" depends on the states involved in the transition. 

In general, the variation of entropy of a system AS — 5 — Sq is equal to the sum of 
the internal entropy production cr inside the system and the entropy change due to the 
interactions with the surroundings ASexi- 

AS^G^ASexi. (22) 

so that when the transition from the initial state So to the fmal state S is irreversible, the 
entropy production > 0, while for reversible processes a — 0. 

Examples of coupled systems. Joule expansion. The first example is the classical Joule 
expansion. A container X is thermally isolated from the heat bath Y, and there are no 
heat exchanges between X and Y. A partition separates two chambers of X, so that a 
volume of gas is kept in the left chamber of X, and the right chamber of X is evacuated. 
Then the partition between the two chambers is opened, and the gas fills the whole 
container, adiabatically. It is well-known that such (irreversible) doubling of the 
volume at constant temperature T increases the entropy of X, i.e. AS{x) — nk log 2, 
where n is the number of particles (cf the Sackur- Tetrode equation""). The same 
increase of entropy results from a quasistatic irreversible variant of the Joule 
expansion. If there was just one particle in the container (a one molecule gas, as shown 
in Fig. 1), the entropy would still increase after the gas expansion, by AS{x) — k log 2, 

Y- 





1 


• 






Figure 1 | The Joule expansion of a one molecule gas. The container Xis in 
thermal isolation from the surrounding exterior Y. As the partition is 
removed, the particle may be on the left or on the right. 
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relates the transition probability Wr^ of the system's reversible state change to the 
conditional probability p(Xn+i | Xn): 



p(x„+i|x„) = - 



-Wr. 



(23) 



Figure 2 | Resetting one bit by compression in the Szilard engine-like 
device. The container X is in thermal contact with the surrounding exterior 
Y. As the partition is re-inserted and moved back to the middle, the particle 
returns to the left side. 



reflecting the uncertainty about the particle's position with respect to the container's 
chambers {left or right). 

In this example, there are no interactions between the systems X (the container) 
and Y {the exterior). In other words, AS{x)ext — 0^ and the entropy increase is due to 
the internal entropy production. Formally, AS{x) = <7ix)y, that is, the internal entropy 
produced by X is not affected at aU by the system Y, because of the total thermal 
isolation between X and Y. 

Szilard engine compression. Now let us consider the second example with the same 
container X, surrounded by the exterior system Y, but now without a thermal isolation 
between the two, so that heat can be exchanged between X and Y. Two movable 
partitions can still divide the container into two separate chambers, as well as fric- 
tionlessly slide along the walls of the container to either side, the left or to the right. 
For clarity, we consider only one particle contained in the system X, and consider the 
dynamics of the partitions as part of the exterior Y. This setup is {a part of) the Szilard 
heat engine^^'*^. 

When a partition is in the middle of the container, the particle is located either in 
the left or the right chamber with equal probability, but eventually collisions between 
the particle and the partition force the latter to one side. It is easy to calculate that, as 
one partition isothermaUy moves to the side of the container {at temperature T), the 
maximum work extracted from the heat bath is kT log 2. In order for the work to be 
extracted, one obviously needs to know on which side the particle was located initially. 
The most important aspects, in the context of our example, are how this coupled 
system modelled on the Szilard engine realizes the logically irreversible operation of 
resetting a bit — the operation used to illustrate Landauer's principle — and what are 
the resultant entropy dynamics. 

A similar physical implementation of this operation is, for instance, considered by 
Maroney*^, as shown in Fig. 2. If the particle is on the left side, then the physical state 
represents logical state zero, and if the particle is on the right side, it represents logical 
state one. The partition is removed from the middle of the container, allowing the 
particle to freely travel within the container. Then the partition is inserted into the far 
right-hand side of the container and is slowly (quasistaticaUy) moved to its original 
position in the middle. This compression process maintains thermal contact between 
the container X and heat bath Y. Collisions with the particle exert a pressure on the 
partition, requiring work to be performed, and the energy from the work is trans- 
ferred to heat in the heat bath Y, amounting to at least kT log 2. Such resetting of a bit 
to zero, which converts at least kT log 2 of work into heat, is a typical example of 
Landauer's principle: any logically irreversible transformation of classical informa- 
tion is necessarily accompanied by the dissipation of at least fcTlog 2 of heat per reset 
bit. The entropy dynamics is the same for the SzUard engine compression, where one 
partition is fixed at the edge and only the second one is moving. 

It is easy to see that the physical result of such isothermal compression is a decrease 
in the thermodynamical entropy of the one molecule gas by k log 2, accompanied by 
an increase in the entropy of the environment by (at least) the same amount. That is, 
AS{x) — —k log 2. At the same time, as the example shows, the heat is dissipated to the 
exterior system Y, i.e., compensated by the entropy change due to the interactions 
with the surroundings, AS{x)ext ~ —k log 2, and hence, there is no internal entropy 
production, (j{x)y ~ 0. 

Assumptions and their illustration in the examples. In an attempt to provide a ther- 
modynamic interpretation of transfer entropy two important assumptions are 
made^^, defining the range of applicability for such an interpretation. The first one 



where Zi is a normalization factor, and W^^ is such that S(xii_|_i) — S(x„) — k log W^-^ . 
The normalization factor is equal to the ratio between the total number of microstates 
in all possible macrostates at « 1 and the total number of microstates in all possible 
macrostates at n. 

We note that the normalization is needed because "relative probability" Wr^ is not 
bounded between zero and one, while the conditional probability is a properly defined 
mathematical probability. For example, in the Joule expansion, the number of 
microstates in x„+i is twice as many as those in the old state x„, making W^j — 2. In 
this simple example the state Xn+i is the only possible macrostate, and the normal- 
ization factor Zi — 2 as well, since the number of all microstates after the transition is 
twice the number of the microstates in the old state before the transition. Hence, the 

right-hand side of the assumption is unity, — W^^ — 1, concurring with the con- 

Zi 

ditional probability p(Xn+i | x^) ~ 1, as this transition is the only one possible. The 
entropy change is S{Xn+i) — S{x„) — k log 2, as expected. 

On the contrary, for the Szilard engine's compression example, the number of 
microstates in before the transition is twice as many as those in the new state x^+i 
after the transition (e.g., partition moves from the far right-hand side to the middle), 
and Wf-^ — 1/2. Following SzUard's original design, we allow for two possibly moving 
partitions: either from the right to the middle, or from the left to the middle, with both 
motions resulting in the same entropy dynamics. Hence, the total number of possible 
microstates in all the resultant macrostates is still 2: the particle is on the left, or on the 
right, respectively. Therefore, there is no overall reduction in the number of all 
possible microstates after the transition from Xj, to x^+i, so that Zi = 1. Thus, 

— Wrj — 1 /2, setting the conditional probability p(Xi,+i | Xn) — 1/2 for the partition 
Zi 

moving from the right. The entropy change is given by 5(x„+i) — S{Xn) — ^: log 1/2 — 
-^clog 2. 

The second assumption relates the transition probability Wj-^ of the system's 
internal state change, in the context of the interactions with the external world 
represented in the state vector y, to the conditional probability p(Xji+i | x„, y„). 
Specifically, the second assumption is set as follows: 



(24) 



for some number W^^ , such that cr(x)^ — k log , where (Tix)y is the system's 
internal entropy production in the context of y. The normalization factor is again the 
ratio between the total numbers of possible microstates after and before the transition. 
In general, Zi Z2 because y^ may either increase or constrain the number of 
microstates in Xn+i. 

For the Joule expansion example, W^^ — Wj-^ — 2, Z2 — Zi — 2, with identical 
resultant probabilities. The lack of difference is due to thermal isolation, and hence 
independence, between X and Y, yielding AS{x) — G{x)y — k log 2. 

For the Szilard engine's compression example, Z2 ~ Zi =^ 1. There is no internal 
entropy production, <j{x)y — k log Wr^ — 0, entailing Wy^ — 1. Thus, our second 
assumption requires that the resultant conditional probability is set as 

p(xn+i |xn,yn) — ^ — 1- That is, once the direction of the partition's motion is 

Z2 

selected (e.g., from the right to the middle), the transition is certain to reach the 
compressed outcome, in context of the external contribution from Y. 
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